Add OSS-Fuzz Atheris fuzzers for core serialization#60148
Add OSS-Fuzz Atheris fuzzers for core serialization#60148skypher wants to merge 3 commits intoapache:mainfrom
Conversation
e6f5a77 to
f8ddf18
Compare
|
As we have a large repo, we should maybe put this in an existing subfolder, not top-level. I would propose moving all below the |
|
Agreed: but we have |
67bbb5b to
5926f49
Compare
|
Nice ! Now - just rebase and resolving conflict :) |
5926f49 to
fd8a690
Compare
Awesome, I think it looks good now! |
|
Hmm.. the problem we see now - is that atheris does not seem to be well prepared for our environment: Attempting to install
I believe you are somewhat connected to Atheris @skypher -> maybe they can simply update their build and release process and produce the binary wheels for all the common platforms - including ARM and MacOS ? Or at the very list make sure that manylinux ARM wheels are available. See https://pypi.org/project/atheris/3.0.0/#files - there are just three binary wheels, only for AMD |
|
Atheris PR: google/atheris#99 |
fd8a690 to
7378248
Compare
|
Hey folks, thanks so much for putting this together. We have dropped support for old Python versions because now that the bytecode is changing to much between Python versions it's not realistic to maintain so many versions with the relatively limited engineering cycles we have to devote to this project. You can use older Atheris versions for that however. Adding ARM / Apple Silicon support is something we have on our roadmap and we are hoping to get that up and running this year. I am curious what problem this PR is trying to solve? Having some CI tests sounds useful but why are we moving OSSFuzz tests into the fuzzing framework itself? There are a ton of atheris fuzzers in OSSFuzz, why is this one being moved specifically? |
Hi @AidenRHall, thanks for the comment! I think there may be a small misunderstanding - we're not moving anything out of OSS-Fuzz. Airflow doesn't currently have OSS-Fuzz integration at all. This PR adds new fuzzing harnesses to the Airflow repository itself, following the standard pattern where projects maintain their fuzz targets in-tree. These aren't CI tests - they're fuzz targets intended for continuous fuzzing via OSS-Fuzz. The eventual goal would be to set up a projects/airflow/ configuration in OSS-Fuzz that points to these harnesses. Does that clarify things? |
|
And just to add @AidenRHall -> the idea here is that we would also like to experiment with more fuzzing ourselves in Airflow. We generally have approach that we do not add anything in our repo - even if it is going to be run externally by OSSFuzz - so that we can reproduce it locally easily. We are starting small and we want to add more fuzzing in Airflow And @skypher was kind enough to propose the PR and adding PR that might be usable by OSSFuzz. However, if we are to make a good use of fuzzing and add it in various parts of Airflow, our contributors need to have an easy way of iterating on it - adding new fuzzing, modifying existing one - and this all should be locally runnable. Many of our contributors have Mac ARM devices they are developing Airflow on. Most PMC members and committers in fact. So if we are serious about fuzzing and about getting people involved in making good use of it - we need to make it easy for them to contribute to our fuzzing. This is the main reason why we also try to use our CI to test it. While Python version is not a blocker (we can easily run it in CI only for Python 3.11+), lack of native ARM wheels is pretty much a blocker - taking into account the time it takes to build Atheris and the environment needed for build to succeed. I hope that clarifies why ARM support is so important for us. |
|
BTW. @skypher -> you can get the Python 3.10 failure go away by adding |
a6df5aa to
d3b6ca3
Compare
Updated, thanks a lot! |
Hey again Aiden, just wondering if there's anything we can do to unblock our Atheris PR for these binaries. For your convenience, here's a copy of the link: google/atheris#99 Let us know please :-) |
|
@AidenRHall any update on this? Thanks! |
Adds base infrastructure for OSS-Fuzz fuzzing under `scripts/ossfuzz/`. Includes: - pyproject.toml with proper Python packaging (Private :: Do Not Upload) - Dependencies on apache-airflow-core, apache-airflow-providers-standard, atheris - Entry points for uv run support - README documenting security model alignment and local testing
Fuzzer for DAG serialization/deserialization targeting `DagSerialization.from_dict()`. Used by Scheduler and API Server with schema validation. Input comes from DAG parsing and caching. Includes: - `.options` with max_len tuning - `.dict` for structured input fuzzing - Seed corpus with minimal DAG JSON
Adds fuzzer for Connection URI parsing which is a security boundary (API input validation). Target: Connection._parse_from_uri() and sanitize_conn_id() Includes dictionary, options file, and seed corpus.
d3b6ca3 to
bb46319
Compare
|
Would be great to get it in and try it :D |
Glad to see your ping! What's needed to get it merged? Are we blocked on the Atheris issue or do you think we can proceed as-is? Doesn't seem like they're willing to get this in anytime soon. |
I guess if ARM is not supported, we can try it without. That will limit local testing, but well, tough. |
Summary
Adds an upstream-owned OSS-Fuzz fuzzer suite under
ossfuzz/.Fuzz targets (Atheris):
serialized_dag_fuzz)connection_uri_fuzz)Each fuzzer includes:
.optionsfiles with tuned input size limits.dictfiles for structured input fuzzingossfuzz/seed_corpus/Security Model Alignment
These fuzzers target code paths with clear security boundaries per Airflow's security model, avoiding the "DAG author trust zone" where DAG authors are expected to run arbitrary code.
Test plan
-max_total_time=10)